Computer and Modernization ›› 2010, Vol. 1 ›› Issue (6): 137-0139.doi: 10.3969/j.issn.1006-2475.2010.06.039

• 网络与通信 • Previous Articles     Next Articles

Improvement of Weight of Web Page Features in Calculation Based on VSM

LI Zhong-yuan, YANG Shou-wen   

  1. College of Computer Science and Technology, Beijing University of Chemical Technology, Beijing 100029, China
  • Received:2010-02-05 Revised:1900-01-01 Online:2010-07-01 Published:2010-07-01

Abstract: This paper uses the classical vector space model for text classification Web page. The weighting of traditional TFIDF formula exists some problems, such as the Web page keywords calculation, the differentiation between keywords is not high. This Web page structure is divided into two parts, one part containing the title, meta data, link anchor documents and Web pages keywords, another part containing the Web page body, and the weighting of the keywords is strengthened. Because the part of page body calculation adopts the improved IDF, so the keywords in the class differentiation effect are promoted to a certain extent. After the test, it proves that the method is feasible.

Key words: VSM, feature representation, TFIDF

CLC Number: